Search Engine using Apache Lucene
نویسندگان
چکیده
The World-Wide Web is a huge network of billions of workstations and this network contains billions of web pages containing information on a wide variety of topics. There are a lot of topics discussed by people, opinions and suggestions shared on various social networking sites that the users are interested in. Low precision and low recall still exists in the current search engines. So a search engine that is effective and one that applies Web mining technology has become very important. A discussion on the various technologies used to implement a search engine and its techniques like indexing and searching on the world wide web is done in this paper. The authors propose to describe the method to create a search engine by using JSoup and Apache Lucene API in the paper.
منابع مشابه
Apache Lucene as Content-Based-Filtering Recommender System: 3 Lessons Learned
For the past few years, we used Apache Lucene as recommendation framework in our scholarly-literature recommender system of the reference-management software Docear. In this paper, we share three lessons learned from our work with Lucene. First, recommendations with relevance scores below 0.025 tend to have significantly lower click-through rates than recommendations with relevance scores above...
متن کاملWatsonsim: Overview of a Question Answering Engine
The objective of the project is to design and run a system to answer Jeopardy questions, similar to Watson. In the course of a semester, we developed an open source question answering system using the Indri, Lucene, Bing and Google search engines, Apache UIMA, OpenNLP, and Weka among many additional modules. By the end of the semester, we achieved 18% accuracy on Jeopardy questions, and work ha...
متن کاملIndexing and Searching Mathematics in Digital Libraries
This paper surveys approaches and systems for searching mathematical formulae in mathematical corpora and on the web. The design and architecture of our MIaS (Math Indexer and Searcher) system is presented, and our design decisions are discussed in detail. An approach based on Presentation MathML using a similarity of math subformulae is suggested and verified by implementing it as a math-aware...
متن کاملPhrasal Queries with LingPipe and Lucene: Ad Hoc Genomics Text Retrieval
The hypothesis we explored for the Ad Hoc task of the Genomics track for TREC 2004 was that phrase-level queries would increase precision over a baseline of token-level terms. We implemented our approach using two open source tools: the Apache Jakarta Lucene TF/IDF search engine (version 1.3) and the Alias-i LingPipe tokenizer and namedentity annotator (version 1.0.6). Contrary to our intuition...
متن کاملGeneric XML-based Framework for Metadata
We present a generic and flexible framework for building geoscientific metadata portals independent of content standards for metadata and protocols. Data can be harvested with commonly used protocols (e.g., Open Archives Initiative Protocol for Metadata Harvesting) and metadata standards like DIF or ISO 19115. The new Java-based portal software supports any XML encoding and makes metadata searc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016